BioGraph: Linking Biological Bases Across Organisms

نویسندگان

  • Luana Loubet Borges
  • André Santanchè
چکیده

Representing data as networks have been shown to be a powerful approach for data analysis in biodiversity, e.g., interactions among organisms; relations among genes and phenotypes, etc. In this context, databases and repositories following a graph model (e.g., RDF) have been increasingly used to interconnect information and to support network-driven analysis. Usually, this kind of analysis requires gathering together and linking data from several distinct and heterogeneous sources. In this work, we investigate this challenge in the context of biological bases focusing on the characterization of living organisms, especially their phenotypes and diseases. It includes the rich diversity of Model Organism Databases (MODs) – repositories specialized in a particular taxon – widely used in the biological and medical studies. We exploit a lightweight integration approach, inspired in the Linked Open Data initiative, mapping several biological bases in a unified graph database – our BioGraph – and linking key elements to offer an interconnected view over the data. The development of computational methods to collect, analyze and store biological data brought unprecedented opportunities to cross data from different organisms. However, there are two main challenges for this kind of analysis. First, data are stored in several distinct datasets, where each repository has its own representation, and they are not interconnected by themselves. Second, it is not trivial to analyze this high amount of data. This research addresses the problem of crossing data from different organisms, resorting to several databases. It involves creating a – our BioGraph – database to support the search and analysis of the phenotypic data. Its main goal is to develop techniques to transform the phenotypic data from heterogeneous and distinct data sources into a homogeneous format, linking them and crossing phenotype information of different organisms. The construction of BioGraph can be summarized as follows: 1. We have imported and linked several data sources: intermine [1], MODs – Model Organism Databases –, Uberon [2], Uberpheno, Human Disease Ontology (DO), the Symptom Ontology (SYMP) and other ontologies. There are several formats of data sources and their heterogeneity was a challenge. Each MOD and dataset used to build BioGraph have its specific format. 2. We have created a unified database containing data from all these data sources. To solve the heterogeneity problem, we developed a unified model to support different approaches to describe phenotypes. 3. We have interlinked data from several sources combining two strategies: (i) exploiting existing cross references among sources; (ii) importing bridges between ontologies: Uberon and Uberpheno. 4. With the interlinked graph, we have inferred new edges and nodes, generating knowledge. Figure 1 shows an overview of BioGraph and how it is organized; it contains: descriptions of phenotypes; Uberon entities; terms of gene ontology; diseases; and symptoms. Edges with labels ”link:uberon” and ”link:uberpheno” indicate that these edges are derived from Uberon and Uberpheno respectively. In red, we highlight nodes and edges which we created by our inference process. Fig. 1. Domain Model. The main contributions of this work are: the unified model to support several descriptive approaches for phenotypes and the unified graph database, containing descriptions of phenotypes from 63 distinct data sources. Future work includes: to import genes, linking them with their phenotypes and diseases and to implement an interface for our system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Synthesis and Biological Activity of Certain Mannich Bases Derivatives from 1, 2, 4-Triazoles

Substituted 1, 2, 4- Triazoles have shown multiple biological activities such as anti-inflammatory, anti fungal, etc. 5-mercapto triazoles were prepared from the potassium dithiocarbazinates. These triazoles were used for preparation of different derivatives by two different schemes. In the first scheme the Mannich bases were prepared from 5- marcapto-s triazole Quinazolines. The 5-Marcato-s-Tr...

متن کامل

Synthesis and Evaluation of Fatty Hydrazides Based on Schiff Bases from Oil Processing Industries Byproducts

Schiff bases of fatty acid hydrazides made from Oil recovered from spent bleaching earth (ORSBE) and Acid oil (AO)were prepared. These newly synthesized Schiff bases were characterized on the basis of FT-IR, elemental analysis and evaluated for biological performance. Schiff bases exhibited mild antibacterial activities against certain micro-organisms if compared with streptomycin used as stand...

متن کامل

Biodiversity informatics: organizing and linking information across the spectrum of life

Biological knowledge can be inferred from three major levels of information: molecules, organisms and ecologies. Bioinformatics is an established field that has made significant advances in the development of systems and techniques to organize contemporary molecular data; biodiversity informatics is an emerging discipline that strives to develop methods to organize knowledge at the organismal l...

متن کامل

Modelling the multidimensional niche by linking functional traits to competitive performance.

Linking competitive outcomes to environmental conditions is necessary for understanding species' distributions and responses to environmental change. Despite this importance, generalizable approaches for predicting competitive outcomes across abiotic gradients are lacking, driven largely by the highly complex and context-dependent nature of biotic interactions. Here, we present and empirically ...

متن کامل

A Neural Named Entity Recognition Approach to Biological Entity Identification

We approach the BioCreative VI Track 1 task of biological entity identification by focusing on named entity recognition (NER) and linking tagged entities to standard database identifiers. For this task, we apply recent neural NER techniques of combining bi-directional long short term memory (BLSTM) network layers with conditional random fields (CRFs) to the biomedical domain. We then use contex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017